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ABSTRACT. We introduce an approximation method to solve an optimal control problem via 
the Lagrange dual of its weak formulation. It is based on a sum-of-squares representation of 
the Hamiltonian, and extends a previous method from polynomial optimization to the generic 
case of smooth problems. Such a representation is infinite-dimensional and relies on a particular 
space of functions — a reproducing kernel Hilbert space — chosen to fit the structure of the 
control problem. After subsampling, it leads to a practical method that amounts to solving 
a semi-definite program. We illustrate our approach by a numerical application on a simple 
low-dimensional control problem. 


1. INTRODUCTION 


The continuous-time optimal control problem (OCP) is a versatile framework modelling a wide 
variety of nonlinear systems and optimization criteria, with countless industrial applications, in- 
cluding aerospace or robotics [MLS94]. Developing efficient numerical methods for solving 
such a general problem is a daunting task, especially for high-dimensional systems. Among current 
methods, indirect methods exploit optimality criteria derived from Pontryagin's maximum principle 
and give precise results but need to be initialized properly, while the less accurate direct methods 
reformulate the problem as a nonlinear program without specific initialization requirements 
Chapter 9]. 

In this paper, we focus on a direct method that computes the optimal value function of the 
problem as the maximal subsolution of the Hamilton-Jacobi-Bellman (HJB) equation. It is de- 
scribed in and is obtained by taking the dual of the weak formulation of 
the OCP, involving occupation measures [Vin93]. In [LHP TOS], the numerical resolution of this 
formulation is based on polynomial optimization |Lasl5|, and hence is restricted to polynomial 
dynamics and cost functions, with a possibly costly extension to smooth functions, involving a 
hierachy of semi-definite programs (SDPs). 

Our main contribution is to extend the numerical method of to non-polynomial 
and smooth OCPs. To this end, we consider a space of smooth functions, called a reproducing 
kernel Hilbert space (RKHS) [Aro50], and use representations of non-negative functions in this 
space with sums-of-squares. This is detailed in Sections |2| and In Section |4| we 
notably prove that this representation is exact in two particular cases: the time-invariant linear 
quadratic regulator (LQR) and smooth control-affine systems. This results in a practical numerical 
method derived in Section |5| that requires to solve an SDP to approximate the optimal value 
function of the OCP. Finally, in Section [6] we illustrate the practical application of this method to 
a simple two-dimensional OCP. 


2. BACKGROUND 


First, we introduce the three building blocks that are then combined in Section B]to design our 
approximation method. 


E-mail address: firstname.lastname@inria.fr. 


2.1. Formulation of OCP with Maximal Subsolutions of HJB. Let X and U be compact 
subsets of IR and R?, for integers d, p > 1, and assume that U is convex. We define the dynamics 
f : [0, T] x X xU — R7, the running cost L : [0, T] x X xU > R, and the terminal cost M : X —> R 
occurring at the fixed terminal time T > 0. In addition, we assume that the state trajectories 
remain in the compact set X, and that sampling points from ¥ is easy. We do not consider 
problems with explicit state-constraints which are left as future work. 

Assume the existence of a smooth function V* € C!(#r), meaning that V* is smoothly differ- 
entiable on XT := [0, T] x X, that is a solution of the HJB equation: Y(t, £) € Xr, 

PV ta) + inf (L(t, x,u) + VV*(E, a)" f(t, £, u)) = 0 

V*(T,2) = M(x), (HJB) 


where VV* refers to the gradient of V* w.r.t. x. For t € [0, T], let U the set of admissible controls 
such that Vs € [t, T], (z(s),u(s)) € X xU. Then V* is the value function [Lib11| of the following 
OCP: V(to, zo) € AT, 


T 
V* (to, £o) = int f L(t, x(t), u(t))dt + M(x(T)) 


uEUo J to 
vt € [to, T], a(t) = f(t,2(t),u(t)), 2(0) = 20. (OCP) 


Let uo be a probability measure on 4. We are interested in the value of the stochastic initial point 
problem, where zo is drawn according to po, that is, Ezy. [V* (0, z9)] = f V*(0,xo)dgo(xo). 

In this paper, instead of directly looking for solutions of the HJB equation, we will focus on the 
following alternative problem (P). namely, finding a maximal subsolution of HJB: 


sup [vo zo)dpto(xo) 
VeCci(Xr) 
V(t, x,u), ar (hot) + L(t x,u) + VV(t,2) f(t, x,u) > 0 
Va, V(T, x) € M(x). (P) 


This is the dual of the weak formulation of the OCP with occupation measures, which is a linear 
program over the space of measures IKSEL17]. Moreover, subsolutions of HJB also 
play a key role in the theory of viscosity solutions |CL83| of partial differential equations. The first 
constraint in is the positivity of a certain Hamiltonian associated to V, namely: 














H(t,x,u) := vt z) + L(t,z,u) + VV (t,x)! f(t,2,u). 


If V = V* is the optimal value function, then the optimal controller u = u*(t,x) minimizes the 
positivity constraint and for all (t,x) € Xr, H*(t,x,u*(t,x)) = 0. 

Our goal is to find an approximate solution V of (P). Under some additional assumptions, (P) 
is equivalent to the OCP. Such regularity and convexity assumptions were first studied by m 
and are detailed in [LHPT08]. In this particular case, the value of problem coincides with the 
one of the stochastic initial point problem: 


sup P — "roro [V" (0, 29)] . 














2.2. Parameterization of the Value Function. A first difficulty in problem (P) is searching 
V in the infinite-dimensional set C!(.Xr). One option is to search V in a finitely-parameterized 
set Fo. A common practice, notably in approximate dynamic programming and reinforcement 
learning Chapter 9], is to use a linear approximation of V, with a feature vector w(t, x) € R™ 
and a parameter 0 in a convex subset O of R”, for m > 1. 

Since we assumed that V* is a solution of the HJB equation, we can restrict that search space 
in to functions V such that V(T,.) = M(.). Then we assume that the parameterization is 
such that VO, Vo(T,.) = M(.), so that we can remove the explicit constraint in (P). Hence our 
parameterized set is: 

Fe :={(t,2) = Vo(t,x) = 0 W(t, x) + M(x) | 6 € 0), 
with v such that ~(T,.) = 0. To simplify the evaluations of VV» and 2”, it is convenient (but 


Ot ^ 
not necessary) to use a separable feature vector Y(t, x) = &(t)e(z), with «(T) = 0. 


2.3. Representing Non-Negative Functions as Sum-of-Squares. Problem is constrained 
by a dense set of inequalities indexed by (t, x, u) € [0, T] x A x, which cannot be directly handled 
by numerical algorithms. Hence we look for a finite — possibly approximate — representation of the 
non-negative function (t, x,u) — H(t,x,u) > 0. 

If f, L, M are polynomials, one way is to use sum-of-squares (SoS) polynomials [Las15|, i.e., 
to represent H(t,x,u) as the sum of the squares of polynomials of a given degree. This is a 
sufficient but not necessary condition for being a non-negative polynomial. This technique has 
been applied to problem (P) in [LHPTO8|, although it is presented in its dual version using the 
method of moments |Hen13|. In any case, this representation is not exact in general and gives 
a lower approximation of (P). To numerically solve the problem, one needs to build a hierarchy 
of SDPs obtained by this SoS representation with polynomials of increasing degree r. Under 
generic conditions, this hierarchy converges to the value of (P). but the speed of convergence is not 
explicitly controlled [PHL17]. This is a potentially critical issue, since the size of the SDP at rank 
r is defined by the number of monomials of degree less than r in the dimension of (t, x,u), which 
is pug TTE a quantity growing exponentially with r. 

In this paper, we opt for another option inspired by recently-introduced machine learning tech- 
niques [RMFEB20|: representing a non-negative function as a SoS in a reproducing kernel Hilbert 
space (RKHS). Hereafter, we briefly define an RKHS and mention its main properties, and refer 
to for a thorough description. Consider a set E, a function k : E x E — R is a positive 
definite kernel if Vn € N, y1,..., Yn € E, the matrix K :— (k(yi; yj))?;=1 is positive semi-definite. 
Associated to a positive definite kernel k, there exists a unique RKHS H, a Hilbert space of func- 
tions E — R, with an inner product {:,-)4, such that the following properties hold (the second 
one is the so-called "reproducing property"): 

e Vy € E, ky :— k(y,-) € H; 

e Vg EH, Vy € E, (g, ky) = gly): 
In addition, there exists a feature map ® : E — H, possibly infinite dimensional, defined by ®(y) = 
ky, which maps a point in E to a function in H, and in particular we have k(y, y") = (®(y), ®(y’)) a. 
Conversely, any feature map defines an RKHS associated to the former kernel. 

Here we mention two classical kernels from the RKHS literature, which we use later to build 
our function representations. Assume that E is a subset of IR^, for l > 1. The polynomial kernel of 
degree r is defined on E x E by k(y, y) = (1--y! y)", and the corresponding embedding ®(y) is the 
vector of (e multivariate monomials of degree less than r. In this case H is finite-dimensional. 
The exponential kernel is defined by k(y, y’) = exp(—||y—y’|l2/c), with o > 0. If E is bounded and 
has locally Lipschitz boundary, the corresponding RKHS is the Sobolev space of functions whose 
weak-derivatives up to order s = £/2 + 1/2 are square-integrable [BTA1]]. 

Functions that are SoS in an RKHS H can be represented using an infinite-dimensional positive 
semi-definite operator Corollary 1]. Indeed, assume that a function h € H is written as 
a sum-of-squares of functions hj € H: 


Vy € E, h(y 2 M, h;(y). 


j=l 
For any v,w € H, we have: 

(w, (v & v)w) = (w, vv*w) = Tr(w*vv*w) = ((v,w))?, 
where & denotes the outer product. Because of the reproducing property: Vj € {1,...,q}, y € E, 


h;(y)? = ((®(y), hin)? = (l), (hy & h;)9(y)). 
And then: 
h(y) = (®(y), A®(y)) x, 
with A :— 375 4, A; @ hj € S;(H) and A has rank at most q, where S;(H) is the set of bounded 
self-adjoint positive semi-definite operators on H. 

In |MFBR20|, this SoS representation in an RKHS is used to model non-negative functions, e.g., 
for signal processing or statistics applications. In some cases, e.g., in Sobolev spaces as we will see 
hereafter, the representation is exact in the sense that all non-negative functions can be written 
as a SoS in H, whereas the polynomial SoS representation is tight only for a restricted class of 


polynomials [Las09|. Besides, SoS polynomials are a particular case of what has just been described, 
if k is the polynomial kernel. In the rest of the paper, we will extend the method of |LHPT08 
from polynomials to any RKHS, at the expense of possibly infinite-dimensional representations. 


3. DENSE SET OF INEQUALITY CONSTRAINTS 


In this section, we start by providing a basic relaxation of problem (P| and a motivation for 
preferring a SoS representation of the non-negativity constraints in (P). Then we present the 
resulting problem and its main features. 


3.1. Relaxed formulation by subsampling. A straightforward relaxation of is obtained 
by finitely subsampling the non-negativity constraints. Let us sample values of (t, (9, u)j¢7 
in [0, T] x AX x U, with I a finite set of cardinality n > 1. For simplicity, assume that wo is the 
mean of no Diracs at points {a en a no}: Besides, let us use a linear parameterization of V 
as described in Section with © = R™. We then obtain a linear program, with a possibly 
unbounded solution in the overparameterized setting (m > n). To circumvent this effect, we add 
a quadratic regularizer on 0 with parameter Ag > 0, and obtain the following problem: 


1 © 
sup — Y 67 9(0, 239) + M(z$) — roll6ll3 
germ No 1— 


TEE GO, 20) + LE, 0,469) me) 





Wiel, 0 


: : AINGE : ; f 
+ (oT vu, 2) + VM (a?) FE, ec, u) > 0. 


Although this is not exactly a linear program if Ag > 0, it can still can be solved easily by standard 
solvers, and we will refer to it as the LP problem. A similar finite-dimensional LP formulation has 
been proposed in for discounted infinite-horizon control problems. It is part of a long series 
of LP formulations for optimal control (see, e.g., and references therein), for dynamic 
programming and more recently for reinforcement learning [LMMN21|. 

This problem will be used as a baseline in Section [6] to be compared with the SoS formulation 
below. It is a relaxation that gives an upper-bound on (P). but it is not straightforward to relate 
the number of samples n and how they are spread to the quality of the approximation. Yet in the 
example below, this can be evaluated explicitly. 

Example 1: Let g : RP? — R be a smooth function with a unique minimizer u*. With 
T=1,L:(t,2,u)4 glu), f 2-0 and M = 0, then V*(t,z) = (1— t)g(u*) and solving the 
OCP is essentially equivalent to finding the global minimizer of g. If V is parameterized by 
Vo(t, x) = 0(1— t), the LP formulation with Ag = 0 writes: 


supÓ s.t. Vi € I, —0 + glui) > 0, 


which is readily solved by 0 = min(g(ui)hier. In general, this method requires O(e ^?) samples 
to approximate g(u*) with precision € [Nov06], and yet when g is smooth, this is not an optimal 
way to perform zero-th order optimization. Indeed, use a SoS representation of g — 0 to 
solve this exact problem, and alleviate the curse of dimensionality when g is sufficiently smooth: 
the number of samples reduces to O(c-?/*) for g € C“(R?). In the rest of this paper, we propose 
to use the same approach and to generalize it to any OCP. We expect similar benefits when H is 
smooth. 


3.2. Strengthened formulation by SoS representation. Consider an RKHS H of real-valued 
functions on [0, T] x ¥ x U, with positive definite kernel k, and ® : E — H the corresponding 
embedding. We use a SoS representation in H, or “kernel SoS”, for the constraint H > 0 in (P): 


sup [vo zo)duo(zo) 
VeCl(Xr), 
ACS4.(H) 


V(t,z,u), H(t, 2,u) = (®(t, v, u), AB(t, x, u)). (KSOS) 


This is a strengthening of the constraint in (P). since being SoS is stronger than being non- 


negative. So in general, (KSOS) is a lower-approximation of (P). However, in certain cases, (KSOS) 


can be equivalent to (P). as we will prove in Section |4| A sufficient condition is the existence of 
A € S4 (H) such that, at the optimal V*: 


V(t,z,u), H*(t,z,u) = (P(t, x,u), A(t, x, u)). 


4. TIGHT SUM-OF-SQUARES REPRESENTATIONS 


We study the tightness of problem (KSOS) in two particular cases: the time-invariant LQR and 
smooth value functions. 


4.1. Case 1: Infinite-Horizon Time-Invariant LQR. First we look at a very simple OCP, 
where every quantity can be computed almost in closed form, and with infinite-horizon so that 
there is no dependence in t. Let f(x,u) = Aox + Bou, for Ao € R?*4, Bo € R?*?, with (Ao, Bo) 
controllable, L(z,u) = xz! Qox + u! Rou, Qo € S4 (R*d), Ro € S,(R?*?), Ro > 0. The optimal 
value function is V*(x) = x! Sox, where So is the unique positive semi-definite solution of the 
algebraic Riccati equation: 








0 = —Qo — Aj So — Soo + So Bo RG | Bg So. 
The optimal controller is u*(x) = —Rj'Bj Sox =: — Kox. 
H* (x,u) = zr Qoz +u! Rout 2x! So(Aot + Bou) 
= u! Rou + x! SoBou + u' Bo Sor + x! SoBoKox. 
This is a SoS of degree-one polynomials in (a, u): 


H* (x, u) = (u + Kox)! Ro(u + Koz) 


= (ar u’) ey Ro(Ko Ip) (*) 


DCE 


j=l 


with q;(z,u) :— [RiP]; (Ko Ip) E 


Hence, an infinite-horizon, time-invariant LQR with unknown parameters can be equivalently 
expressed by: 


sup [ v «9 (to)duo(xo) 
ad, N-0 


V(r,u) L(x,u) + VV(z)! f(z,u) = oi N (*) | 


In the next section, we prove that similar SoS constructions exist for sufficiently smooth OCPs, 
possibly with an infinite-dimensional embedding (v.s. a (d + p)-dimensional one here). 


4.2. Sum-of-Squares Decomposition with Smooth Functions. We show that, for smooth 
and control-affine OCPs, H* is a SoS of smooth functions. Let Q := Int(A), Qi := Int{(t, x) € 
Xr | argmin,c;, H*(t,x,u) C Q2}, and Q :2 Q x 93. 


Theorem 1. Lets €N, s » 1. Assume that: 
e f is control-affine: V(t,x,u) € [0, T] x X xU, 
f(t, v, u) = g(t, x) + Bit, x)u. 
e For all (t,x) € Q1, ut L(t,x,u) is twice differentiable on Q2 and strongly convex: V2 L(t, x, u) > 
pl for some p > 0, and (t, v,u) œ> V2L(t, x,u) € C*(Q). 
e (t, x,u) 9 VuL(t,z,u) + B(t, x)! V,V*(t,x) € C8(Q). 
Then there exist p functions (w;)i<j<p € C*(Q) such that: 


p 
V(t,z,u) EQ,  H*(t,x,u) p (t,£, u)? 


The proof is in the appendix. This result motivates the use of exponential kernels, inducing a 
Sobolev space RKHS, to represent the non-negativity constraints in (P) for smooth OCPs. When 
s > d/2 4-1, by applying a technique similar to the one used under Corollary 2 of [RMFB20], 
it is possible to obtain a SoS representation in terms of the exponential kernel. Then is 
equivalent to (P). 


4.3. Stochastic Smoothing of the Optimal Value Function. However in general, V* is not 
necessarily smooth (e.g. minimal time problems), nor is u* (e.g. bang-bang controllers that are 
not even continuous). Here, we provide a generic technique to give some regularity to V*. For 
n > 0, we can define a perturbed version of the control system [FR12], where the state is a random 
variable X;: 
dX, = f(t, Xy, w)dt + (2) ^B, 

where B, is a standard Brownian motion independent of X+. We define the optimal value function, 
with Xo = Zo: 














Tp 
V” (to, zo) = m y / L(t, X4, w)dt + M(Xr) 
u to 
V" is the unique Ct?(¥r) (C! in t, C? in x) solution [FR12] of the following regularized HJB 
equation: V(t,z) € Xr, 


inf {L(t,x,u) 4-VV"(t, a)! f(t, £, u)} 


" 
mu x) - nAV"(t, x) = 0, 


with V"(T,z) = M(x). AV" refers to the Laplacian with respect to x. Contrary to HJB, the 
solutions are at least C!?(4) because this is a quasilinear parabolic partial differential equa- 
tion [Lie96]. The regularization "AV is a vanishing viscosity term, and the optimal controller is 
still in argmin,, L 4- VV”! f. Generically, V? converges to V* as 7 — 0, following a reasoning 
similar to the theory of viscosity solutions ; 


5. SDP FORMULATION AND ITS NUMERICAL RESOLUTION 


5.1. Finite-Dimensional Formulation via Subsampling. Similarly to the (LP) formulation 
that relaxes (P), we will now derive a relaxation of problem (K505), which is another relaxation 
of problem ort the SoS representation of H* is tight. Going through as an intermediate 
step will help to exploit the structure of (p). Using a parameterization of V in Fo with © = R”, 
and a set of sampled points (t9, x9, u(0);-; in (0, T] x AX x U, with |I| = n, we obtain: 


SUD Aes, (H),0cO c' 6 — Aolo} — ATr(A) + C 
such that Vi € (1,..., n), 
bi +a; 8 = (O49, 2,0), Aa (109,209, 402), 


with c := D, uP YO, 2), € :— D, (P M (a9), b; = LO, 2, u(0)-- V M (aO) ft, rO, u) + 
nAM (x), and a; :— Jy (tO, (9) f (£O), 2, ul) 4- 99 40, a 9) -nAv (tO , 2), where Jy denotes 
the Jacobian matrix of i» with respect to x only. Note that we integrate the stochastic smoothing 
process in this formulation, with parameter 7 that can be eventually set to 0. 

The regularization parameter > 0 controls the trace of the infinite-dimensional operator A, 
and allows for subsampling to provably recover the non-subsampled program when n tends to 
infinity, and À goes to zero at the proper rate (see for the precise dependence). In the 
limit À — 0, we recover the LP formulation where we assume nothing about the SoS representation 
of H* in H. 

Both the operator A and the 6(¢@,2,u) can be infinite dimensional, depending on the 
RKHS H. Yet, following [RMFEB20|, we can reformulate the problem equivalently in finite dimen- 
sion. Using the representer theorem in [MFBR20], one can prove that A can be sought in the form: 
for D € R"?*", D 7-0, 


A= Y Dj Ft, a, u(9) @ e(t, sO, uO). 


i,j—1 


Simple computations detailed in |RMFB20] show that: 


Vi, (6,2 ,u), ABE, 2, ul) = [KDK] 
Tr(A) = Tr(DK), 


di? 


where K is the kernel matrix with entry (i, j) equal to k ((t?, x(9, (0), (£0, 2, u)). Assume 
that K > 0. We denote by K = R'R the Cholesky decomposition of K, with R an invertible 


upper-triangular matrix. 
Let B:= RDR! and for 1 < i < n, 6; := R4. Then: 


Tr(B) = Tr(DK) = Tr(.A), 
{ [KDK]; = [R' BR],, = ©/ BO. 


ii 
The problem can now be reformulated as a finite-dimensional SDP over the positive semi-definite 
matrix B € R"*”: 


Suppg.ooeg» C 0— Aello — A Tr(B) -- C (SDP) 
such that Wi € {1,...,n}, b; +a) 0 = (9;)' BO. 


An important question is to estimate the number of subsampled inequalities sufficient to ensure 
that (SDP) © (KSOS). If nothing is assumed on the structure of H, as in the LP method, this 
number is infinite. In contrast, the kernel SoS representation can reduce it or make it finite. If H 
is a polynomial of degree 2r, k is the polynomial kernel of degree r, then n > 2r distinct sampled 
points are enough to interpolate H, and (SDP) = (KSOS). Another example is global optimization 
of smooth functions (see Example 1) with the exponential kernel. We refer to for the 
analysis of the convergence rates, with a lower dependence in the dimension for the kernel SoS 
when compared to direct inequality subsampling (corresponding to the LP approach). 


5.2. Interior Point Method with the Damped Newton Method. Problem (SDP) can be 
readily solved by any off-the-shelf SDP solver. However, for large n, this quickly becomes too 
computationally demanding. Here, we propose a numerical scheme based on that scales 
better with the number of subsamples n. First, we introduce a slack variable 6 € R” allowing the 
constraints to be slightly violated (e.g. because Fe is not a perfect model), controlled by a large 
parameter y > 0. Second, we introduce a log-barrier term controlled by a small € > 0, useful to 
form the dual of the SDP. We obtain the following problem: 
sup c!6—ATr(B)— 1812 — ^||9||? + elogdet B+ C 


B0, 
0,5 


such that Vi € {1,...,n}, bi +a; 0 = (®;)' B®; + ôi- 





The Lagrange dual of this problem reads: 


2 
n 1 m n 
infaer” 5 aibi + D x (s E 5 sm) 
ici 8 j—1 i—1 
1 
—elog det U (a) + pelle +enlog(e/e) + C, 


where U (a) := M, + 9! Diag(a)®, and 9 := R is the matrix with rows (®;)1<i<n. Let us call 
the objective F(a). 

Since F/e is self-concordant like in [RMFB20|, we propose to use damped Newton 
iterations on F/e: 


—— À 
MESE ION Q, 


where A(a) := —[F”(a)]~!F’(a) is the Newton direction and A(a) := Ao F”(a)Aa/e is the 
Newton decrement. The gradient and Hessian of F are computed by: 


OF 1 i i 
Ba; =b; + Dy 22 (s Dx T a 


—& 6) U(a)14. 





3? F 
0a,00; 








li-; 


EX p Tp (18.1? 
"qu 2, ouais te [8 U(o) 9| + 25 


At optimum, the value function is recovered by 


n 
0* — x: (Sarate) ; 

i=1 
and the dual variable a* plays a role similar to an occupation measure [Vin93], although it is 
not necessarily non-negative. To improve numerical stability in the experiments hereafter, we 
used an homotopy heuristics that progressively decreases the parameters Ag and €. Moreover, 
parallel implementations are possible because no singular value decomposition is needed, only 
matrix operations and system inversions. 


6. NUMERICAL EXAMPLE 


In this section, we apply the kernel SoS method along with the basic LP method, on a two- 
dimensional control problem, namely the double integrator with finite horizon. 
Setting. The problem is an LQR, as in Section but with finite-horizon T = 1, d= 2, p= 1, 


M(x) = |\x|l3, 
0 1 0 
Ao = (5 o) = (0) + = Ro rna. 


The optimal value function and controller are V*(t,x) = x! S(t)z, u*(t,z) = —Rg! Bg S(t)x =: 
—K(t)x, where S(.) is the positive semi-definite solution of S(T) = I» and: 


S(t) = -Qo — Aq S(t) — S(t) Ap + S(t) Bo Rs ! By S(t). 


Parameterization of V. Let Va(t,z) = 0 w(t,z), where each entry of v is a product of basis 
functions on ¥ and [0, T]. Let v(x) := (1,21, 22,2123, 22, 12)! , because we know V* is quadratic 
in z. For « on [0, T], we only know that it is a smooth function, so we use an approximate basis of 
the Sobolev space of functions with squared integrable derivatives: a sequence of sines and cosines 
with decreasing periods beginning with 2T to avoid constraining V(0,.) = V(T,.), and &(T) = 0, 


ensures that V(T,.) 2 M(.): 
OR l. wrt—-T\\" 
a(t) = | z sin | 5 78 TE 


Finally, V;,6;(t, x) :— vi(r)&;(t), and 0 € R”, m = 6m,. We choose m, = 10, for which the 
performance of the policy of the projection of V* on Fe is almost perfect. 
Evaluation. We give two criteria to evaluate the quality of an approximation V. First, the distance 
to V*: ||V — V*|?, where V is the vector of its values on a regular grid on [0, T] x [-1, 1]? with 
10 x 10 x 10 points. Second, the cost of the policy on a 10 x 10 regular grid of initial points. 
Sampling. The set of samples (t, 2, u(?);c; is built as follows. The z® are n, points in [—1, 1]? 
generated by the Sobol sequence [Sob67], the (ul) )1<;<», are on a uniform grid on [—10, 10] and 
the gae on [0,7]. The sample set is the Cartesian product of the three previous ones, and 
has n = n;n4n,, elements. We also use the same samples as initial points a, a) in the objective 
function of problem (p). Note that we have replaced it with $77 , V (t9, z(?)/n, as we found it 
more efficient in our experiments to optimize over V at intermediate time steps rather than at to 
only. Indeed, we ultimately evaluate our approximation by the accuracy of V on the whole #7 and 
not only on {to} x X. In a discrete states and actions setting, this effect is analyzed in [DFVR03], 
where po is denoted as “state-relevance weights". 
Methods. We compare three methods: the LP, the guided SoS and the kernel SoS. The LP method 
is detailed in Section [3.1] and as for the kernel SoS method, we add a slackness parameter on the 
constraints, with a penalization controlled by + > 0 (y — oo recovers the original LP). 

The guided SoS method is the same as the (SDP) problem, except that the embeddings ®; of 
the samples are replaced by vectors V; of fixed dimension, which are computed explicitly, without 
a kernel. Motivated by the fact that: 


H* (t, z,u) = (u+ K(t)z)! R(u + K(t)z), 


Distance to V* 


— LP 
74 —— guided SoS 
— kernel SoS 
64 --— projection 
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FIGURE 1. Comparison of the performances of the value function and the policy 
of the three methods, as a function of the number of samples n, = ny. 


we choose the embedding vectors as follows: 
V; :— (u /10, 2, (1/wsinwr/2(t/T -1))t), neq) 


where the last q scalar terms (without the vector x) approximately model K(.) as a smooth 
function of t. For computational efficiency, we choose q; = 5 and we checked that this basis can 
approximate the entries of K (t) well. Then we solve an SDP of size (p + qd) x (p+ qd) = 11x 11 
instead of n x n for the kernel version. 

The kernel SoS method is as described in the previous sections, with the following kernel: 


k((t, x, u), (t, 2, u’)) = (u, w)/100 + (x, z^) x exp(—|t — t’]). 


This kernel is also designed to match the shape of H*, with a smooth term in t modelled by the 
exponential kernel. The matrix K can be singular, so we replace it by K + 10787. 

Results. We compare the performance of the three methods to a baseline: the projection of V* 
on Fe, which is a proxy for the best performance to expect with a fixed Fe. We keep the best set 
of hyper-parameters after a grid search on (Ag, y) for the LP method, and on (Ae, vy, À) for the two 
others, with e = 10-4. We keep 7 = 0 and n; = 20 in all the experiments, and a varying number 
Ny = n, of sample points. For example, with n, = ny = 20, the dual variable a of the largest 
problem here has dimension n = 8000, and solving the numerical problem written in Python takes 
a few minutes on a standard laptop. 


The results are presented in Figure The guided and kernel SoS methods perform similar, 
and better than the LP: they better exploit a fixed number of samples than the LP. Note that the 
kernel SoS tends to the LP when À tends to 0, hence using a positive À improves the results. We 
believe that the design of a kernel adapted to prior knowledge on the problem is crucial to benefit 
from this effect. Finally, the kernel SoS has the same performance as the guided SoS, but it is 
computationally more expensive as soon as n > 11. Yet the kernel version extends way beyond 
such fixed finite-dimensional embeddings, to infinite-dimensional embeddings represented by any 
positive definite kernel, including the exponential kernel, the polynomial kernel and many others. 


CONCLUSION 


The kernel SoS approximation method generalizes the polynomial SoS method for OCPs. Like 
the simple LP method, it is black-box in the sense that it is based only on function evaluations 
of the dynamics and loss, without requiring any gradients. Moreover, it enables to exploit prior 
knowledge on the structure of an OCP, by choosing an appropriate kernel. The problem reduces 
to an SDP, whose size can be computationally limiting, but parallel implementations are possible. 
There are several sources of approximation in this method: the parameterization Fe of V might 
not be exact, the SoS representation of H* is not exact in general (although we have proved it 
is in a few particular cases), and we subsample a finite number of constraints (further work is 
needed to evaluate the effect of this step). In particular, it seems essential to assess in which 
cases the subsampling step is tight with a finite number of samples, or approximately so, in such 
a way that the overall process gives a certified lower-bound on the OCP, similarly to [LHPTOS]. 
For all these reasons, the method will probably not reach high precision solutions, but can be 
used to initialize direct shooting methods, and returns an approximate solution even with very 
few samples. Furthermore, we believe it is possible to extend the method to also account for state 
constraints, similarly to [LHPTOS8]. One could also parameterize the value function directly in an 
RKHS. Another interesting extension is to apply the method to Markov decision processes, where 
we could deal with states or actions that are more complex objects (graphs, trajectories, DNA 
sequences...) with appropriate kernels. 
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APPENDIX A. PROOF OF THEOREM 1 


Proof. Consider the Hamiltonian with f control-affine: 


* 


OV 
H*(t,æ,u) = VV* (t,x) glt,x) + op 2) 
+ L(t,z,u) + VV*(t,2)' B(t,x)u. 
Pontryagin’s maximum principle states that: 
V(t, x) EO, inf H* (t, x, u) = 0. 
Since L is strongly convex in u, the minimizer is unique and we call it u*(t,x). By definitions of 
Qı and Q2, we have a mapping u* : Qı — Qə and it is characterized by: 
Va H* (t,z,u) = VuL(t, x,u” (t, z)) + B(t,zx)! Va V* (t, x) 
=0. 

Since (t,z,u)  V2L(t,x,u) is continuous on Q and invertible, and (t, x,u) > V,H*(t,x,u) € 
C*(Q), then the implicit function theorem ensures that u* € C?(Q,) (see [Sch81], Chapter 8, 


Theorems 25 & 31). 
For (t, 2, u) € Q, we use Taylor's formula around u*(t, x): 


H* (t,x, u) = H* (t,x, u* (t, x)) 
+ V4H* (t, z,u* (t,2))! (u — u*(t,x)) 
+ (u — u*(t,2))! R(t, v, u)(u — u* (t, £)), 


1 
R(t, x, u) := | (1— 7)V2 H* (t, x, (1 — T)u* (t,£) + ru)dr. 
0 
Since H* (t, x, u*(t,z)) = 0, V, H* (t, x, u* (t, x)) = 0 (by definition), and V2 H* (t, z,-) = V?L(t,x,:), 
we have: 
H*(t,z,u) = (u — u*(t,2))! R(t,x,u)(u — u*(t,x)), and 


1 
R(t,x,u)— J (1— 7)V2L(t,z, (1 — r)u* (£2) + ru)dr > ae 
0 


For (t,z,u) € Q, R(t,x,u) has a positive-definite square root ,/R(t,xz,u). Also, Vr € [0,1], 
(1 — r)u* (t, x) + ru € Qe because Int(A) is convex like U. 
Since Vi, j, 52 € C*(Q), u* € C*(0,), and y- is C? on (M | MT = M, M > $I}, then 
rij: (t2, u) +> ej / R(t,z,u)e; € C“(Q), and we have the decomposition: 
p 
H*(t,x,u) = X uilt, z, u}, with 


w(t, x,u) := y R(t, x,u); (u — u* (t, x)) 
= Sri x,u) (ej (u — u*(t,2))) , 


j=l 











and each w; € C*(Q). 





